Answer Questions with Right Image Regions: A Visual Attention Regularization Approach
نویسندگان
چکیده
Visual attention in Question Answering (VQA) targets at locating the right image regions regarding answer prediction, offering a powerful technique to promote multi-modal understanding. However, recent studies have pointed out that highlighted from visual are often irrelevant given question and answer, leading model confusion for correct reasoning. To tackle this problem, existing methods mostly resort aligning weights with human attentions. Nevertheless, gathering such data is laborious expensive, making it burdensome adapt well-developed models across datasets. address issue, article, we devise novel regularization approach, namely, AttReg, better grounding VQA. Specifically, AttReg first identifies essential answering yet unexpectedly ignored (i.e., assigned low weights) by backbone model. And then mask-guided learning scheme leveraged regularize focus more on these key regions. The proposed method very flexible model-agnostic, which can be integrated into most attention-based VQA require no supervision. Extensive experiments over three benchmark datasets, i.e., VQA-CP v2, v1, been conducted evaluate effectiveness of AttReg. As by-product, when incorporating strong baseline LMH, our approach achieve new state-of-the-art accuracy 60.00% an absolute performance gain 7.01% v2 dataset. In addition validation, recognize faithfulness has not well explored literature. light this, propose empirically validate property compare prevalent gradient-based approaches.
منابع مشابه
Visual Pattern Image Coding by a Morphological Approach (RESEARCH NOTE)
This paper presents an improvement of the Visual Pattern image coding (VPIC) scheme presented by Chen and Bovik in [2] and [3]. The patterns in this improved scheme are defined by morphological operations and classified by absolute error minimization. The improved scheme identifies more uniform blocks and reduces the noise effect. Therefore, it improves the compression ratio and image quality i...
متن کاملA geometric approach for color image regularization
We present a new vectorial total variation method that addresses the problem of color consistent image filtering. Our approach is inspired from the double-opponent cell representation in the human visual cortex. Existing methods of vectorial total variation regularizers have insufficient (or no) coupling between the color channels and thus may introduce color artifacts. We address this problem ...
متن کاملA Geometric Approach to Color Image Regularization
We present a new vectorial total variation method that addresses the problem of color consistent image filtering. Our approach is inspired from the double-opponent cell representation in the human visual cortex. Existing methods of vectorial total variation regularizers have insufficient (or no) coupling between the color channels and thus may introduce color artifacts. We address this problem ...
متن کاملLearning to Answer Questions from Image Using Convolutional Neural Network
In this paper, we propose to employ the convolutional neural network (CNN) for learning to answer questions from the image. Our proposed CNN provides an endto-end framework for learning not only the image representation, the composition model for question, but also the intermodal interaction between the image and question, for the generation of answer. More specifically, the proposed model cons...
متن کاملImage retrieval using visual attention
Author: Liam M. Mayron Title: Image retrieval using visual attention Institution: Florida Atlantic University Dissertation Advisor: Dr. Oge Marques Degree: Doctor of Philosophy Year: 2008 The retrieval of digital images is hindered by the semantic gap. The semantic gap is the disparity between a user’s high-level interpretation of an image and the information that can be extracted from an image...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: ACM Transactions on Multimedia Computing, Communications, and Applications
سال: 2022
ISSN: ['1551-6857', '1551-6865']
DOI: https://doi.org/10.1145/3498340